Stock Price PredictionΒΆ
This notebook demonstrates stock price forecasting using ARIMA modeling with historical data from Bajaj Finserv.
InΒ [1]:
# Installing
!pip install plotly
Requirement already satisfied: plotly in c:\users\user\anaconda3\lib\site-packages (5.24.1) Requirement already satisfied: tenacity>=6.2.0 in c:\users\user\anaconda3\lib\site-packages (from plotly) (9.0.0) Requirement already satisfied: packaging in c:\users\user\anaconda3\lib\site-packages (from plotly) (24.2)
1. SetupΒΆ
We begin by installing and importing the required libraries.
InΒ [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pmdarima
import plotly.graph_objects as go
from pmdarima import auto_arima
from scipy import stats
InΒ [3]:
# Data source
dataframe = pd.read_csv(r"C:\Users\User\Downloads\BAJAJFINSV.csv")
2. Load the DatasetΒΆ
We load the CSV file containing stock price data.
InΒ [4]:
# Overview
dataframe.head()
Out[4]:
| Date | Symbol | Series | Prev Close | Open | High | Low | Last | Close | VWAP | Volume | Turnover | Trades | Deliverable Volume | %Deliverble | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2008-05-26 | BAJAJFINSV | EQ | 2101.05 | 600.00 | 619.00 | 501.0 | 505.1 | 509.10 | 548.85 | 3145446 | 1.726368e+14 | NaN | 908264 | 0.2888 |
| 1 | 2008-05-27 | BAJAJFINSV | EQ | 509.10 | 505.00 | 610.95 | 491.1 | 564.0 | 554.65 | 572.15 | 4349144 | 2.488370e+14 | NaN | 677627 | 0.1558 |
| 2 | 2008-05-28 | BAJAJFINSV | EQ | 554.65 | 564.00 | 665.60 | 564.0 | 643.0 | 640.95 | 618.37 | 4588759 | 2.837530e+14 | NaN | 774895 | 0.1689 |
| 3 | 2008-05-29 | BAJAJFINSV | EQ | 640.95 | 656.65 | 703.00 | 608.0 | 634.5 | 632.40 | 659.60 | 4522302 | 2.982921e+14 | NaN | 1006161 | 0.2225 |
| 4 | 2008-05-30 | BAJAJFINSV | EQ | 632.40 | 642.40 | 668.00 | 588.3 | 647.0 | 644.00 | 636.41 | 3057669 | 1.945929e+14 | NaN | 462832 | 0.1514 |
InΒ [5]:
# Setting index
dataframe.set_index('Date',inplace=True)
3. Initial Data ExplorationΒΆ
Letβs examine the first few rows and summary statistics.
InΒ [6]:
# Describe
dataframe.describe()
Out[6]:
| Prev Close | Open | High | Low | Last | Close | VWAP | Volume | Turnover | Trades | Deliverable Volume | %Deliverble | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 3201.000000 | 3201.000000 | 3201.000000 | 3201.000000 | 3201.000000 | 3201.000000 | 3201.000000 | 3.201000e+03 | 3.201000e+03 | 2456.000000 | 3.201000e+03 | 3201.000000 |
| mean | 2755.864386 | 2760.382381 | 2803.614449 | 2716.731443 | 2758.781537 | 2758.657451 | 2761.156954 | 2.315312e+05 | 9.533424e+13 | 20892.811075 | 7.409510e+04 | 0.471614 |
| std | 2869.811765 | 2874.814173 | 2912.885262 | 2834.037357 | 2873.792614 | 2873.522615 | 2874.033545 | 4.402681e+05 | 2.176448e+14 | 32396.302068 | 1.464012e+05 | 0.218910 |
| min | 90.750000 | 88.150000 | 93.100000 | 88.150000 | 91.000000 | 90.750000 | 89.260000 | 4.570000e+02 | 1.376712e+10 | 149.000000 | 4.560000e+02 | 0.056200 |
| 25% | 527.900000 | 528.600000 | 542.600000 | 520.000000 | 527.950000 | 527.900000 | 531.270000 | 3.981100e+04 | 2.751053e+12 | 2951.750000 | 2.086300e+04 | 0.287400 |
| 50% | 1098.700000 | 1095.000000 | 1118.000000 | 1080.250000 | 1100.000000 | 1098.700000 | 1103.560000 | 9.995300e+04 | 1.090486e+13 | 9450.000000 | 4.159700e+04 | 0.469700 |
| 75% | 5121.900000 | 5120.000000 | 5199.800000 | 5042.800000 | 5115.000000 | 5125.100000 | 5127.510000 | 2.315400e+05 | 8.755946e+13 | 24439.750000 | 8.308900e+04 | 0.636000 |
| max | 11176.550000 | 11000.000000 | 11300.000000 | 10868.700000 | 11175.450000 | 11176.550000 | 11081.780000 | 6.271671e+06 | 3.394379e+15 | 312959.000000 | 3.804696e+06 | 1.000000 |
InΒ [7]:
# Data cleaning
# 1- check for null
dataframe.isnull().sum()
Out[7]:
Symbol 0 Series 0 Prev Close 0 Open 0 High 0 Low 0 Last 0 Close 0 VWAP 0 Volume 0 Turnover 0 Trades 745 Deliverable Volume 0 %Deliverble 0 dtype: int64
InΒ [8]:
dataframe["Trades"].isna()
Out[8]:
Date
2008-05-26 True
2008-05-27 True
2008-05-28 True
2008-05-29 True
2008-05-30 True
...
2021-04-26 False
2021-04-27 False
2021-04-28 False
2021-04-29 False
2021-04-30 False
Name: Trades, Length: 3201, dtype: bool
4. Data PreprocessingΒΆ
We clean the dataset, convert dates, and check for missing values.
InΒ [9]:
dataframe[dataframe["Trades"].isna()]
Out[9]:
| Symbol | Series | Prev Close | Open | High | Low | Last | Close | VWAP | Volume | Turnover | Trades | Deliverable Volume | %Deliverble | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Date | ||||||||||||||
| 2008-05-26 | BAJAJFINSV | EQ | 2101.05 | 600.00 | 619.00 | 501.00 | 505.10 | 509.10 | 548.85 | 3145446 | 1.726368e+14 | NaN | 908264 | 0.2888 |
| 2008-05-27 | BAJAJFINSV | EQ | 509.10 | 505.00 | 610.95 | 491.10 | 564.00 | 554.65 | 572.15 | 4349144 | 2.488370e+14 | NaN | 677627 | 0.1558 |
| 2008-05-28 | BAJAJFINSV | EQ | 554.65 | 564.00 | 665.60 | 564.00 | 643.00 | 640.95 | 618.37 | 4588759 | 2.837530e+14 | NaN | 774895 | 0.1689 |
| 2008-05-29 | BAJAJFINSV | EQ | 640.95 | 656.65 | 703.00 | 608.00 | 634.50 | 632.40 | 659.60 | 4522302 | 2.982921e+14 | NaN | 1006161 | 0.2225 |
| 2008-05-30 | BAJAJFINSV | EQ | 632.40 | 642.40 | 668.00 | 588.30 | 647.00 | 644.00 | 636.41 | 3057669 | 1.945929e+14 | NaN | 462832 | 0.1514 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 2011-05-25 | BAJAJFINSV | EQ | 490.50 | 485.00 | 498.50 | 485.00 | 489.00 | 485.95 | 491.58 | 68329 | 3.358895e+12 | NaN | 17938 | 0.2625 |
| 2011-05-26 | BAJAJFINSV | EQ | 485.95 | 489.90 | 491.40 | 482.20 | 485.40 | 484.70 | 486.95 | 27605 | 1.344235e+12 | NaN | 8579 | 0.3108 |
| 2011-05-27 | BAJAJFINSV | EQ | 484.70 | 485.65 | 492.00 | 484.05 | 486.30 | 486.90 | 487.88 | 35212 | 1.717919e+12 | NaN | 11239 | 0.3192 |
| 2011-05-30 | BAJAJFINSV | EQ | 486.90 | 491.85 | 498.00 | 490.30 | 493.10 | 493.70 | 493.78 | 43441 | 2.145038e+12 | NaN | 13254 | 0.3051 |
| 2011-05-31 | BAJAJFINSV | EQ | 493.70 | 495.05 | 521.50 | 494.00 | 517.25 | 518.40 | 513.67 | 327035 | 1.679887e+13 | NaN | 67729 | 0.2071 |
745 rows Γ 14 columns
InΒ [10]:
dataframe = dataframe.drop(columns=["Trades"])
InΒ [11]:
# 2- Check for duplicate
dataframe.duplicated().sum()
Out[11]:
np.int64(0)
InΒ [12]:
dataframe.columns
Out[12]:
Index(['Symbol', 'Series', 'Prev Close', 'Open', 'High', 'Low', 'Last',
'Close', 'VWAP', 'Volume', 'Turnover', 'Deliverable Volume',
'%Deliverble'],
dtype='object')
InΒ [13]:
# Plotting
dataframe["VWAP"].plot(figsize=(15,5))
Out[13]:
<Axes: xlabel='Date'>
5. Exploratory Data AnalysisΒΆ
We visualize trends and check stationarity of the time series.
InΒ [14]:
sns.histplot(data = dataframe["VWAP"])
Out[14]:
<Axes: xlabel='VWAP', ylabel='Count'>
InΒ [15]:
sns.kdeplot(data = dataframe["VWAP"], fill = True)
Out[15]:
<Axes: xlabel='VWAP', ylabel='Density'>
InΒ [16]:
stats.probplot(x = dataframe["VWAP"], plot = plt)
Out[16]:
((array([-3.51908695, -3.27647555, -3.14237236, ..., 3.14237236,
3.27647555, 3.51908695]),
array([ 89.26, 93.99, 94.79, ..., 10486.75, 10980.4 , 11081.78])),
(np.float64(2583.530410402808),
np.float64(2761.156954076851),
np.float64(0.8981685068494004)))
InΒ [17]:
cols = ['High', 'Low', 'Last',
'Close']
dataframe[cols].plot(figsize=(15,5), subplots = True)
Out[17]:
array([<Axes: xlabel='Date'>, <Axes: xlabel='Date'>,
<Axes: xlabel='Date'>, <Axes: xlabel='Date'>], dtype=object)
InΒ [18]:
go.Figure(data=[go.Candlestick(x=dataframe.index[0:50],
open=dataframe['Open'][0:50],
close=dataframe['Close'][0:50],
high=dataframe['High'][0:50],
low=dataframe['Low'][0:50])])
6. ARIMA Model TrainingΒΆ
We fit an ARIMA model using pmdarima.auto_arima.
InΒ [19]:
# Rolling
lag_features = ['High', 'Low','Volume', 'Turnover']
dataframe[lag_features].head()
Out[19]:
| High | Low | Volume | Turnover | |
|---|---|---|---|---|
| Date | ||||
| 2008-05-26 | 619.00 | 501.0 | 3145446 | 1.726368e+14 |
| 2008-05-27 | 610.95 | 491.1 | 4349144 | 2.488370e+14 |
| 2008-05-28 | 665.60 | 564.0 | 4588759 | 2.837530e+14 |
| 2008-05-29 | 703.00 | 608.0 | 4522302 | 2.982921e+14 |
| 2008-05-30 | 668.00 | 588.3 | 3057669 | 1.945929e+14 |
InΒ [20]:
for cols in lag_features:
dataframe[cols+"window_mean_3"] = dataframe[cols].rolling(window=3).mean()
dataframe[cols+"window_mean_7"] = dataframe[cols].rolling(window=7).mean()
dataframe[cols+"window_std_3"] = dataframe[cols].rolling(window=3).std()
dataframe[cols+"window_std_7"] = dataframe[cols].rolling(window=7).std()
dataframe.head()
Out[20]:
| Symbol | Series | Prev Close | Open | High | Low | Last | Close | VWAP | Volume | ... | Lowwindow_std_3 | Lowwindow_std_7 | Volumewindow_mean_3 | Volumewindow_mean_7 | Volumewindow_std_3 | Volumewindow_std_7 | Turnoverwindow_mean_3 | Turnoverwindow_mean_7 | Turnoverwindow_std_3 | Turnoverwindow_std_7 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Date | |||||||||||||||||||||
| 2008-05-26 | BAJAJFINSV | EQ | 2101.05 | 600.00 | 619.00 | 501.0 | 505.1 | 509.10 | 548.85 | 3145446 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2008-05-27 | BAJAJFINSV | EQ | 509.10 | 505.00 | 610.95 | 491.1 | 564.0 | 554.65 | 572.15 | 4349144 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2008-05-28 | BAJAJFINSV | EQ | 554.65 | 564.00 | 665.60 | 564.0 | 643.0 | 640.95 | 618.37 | 4588759 | ... | 39.542003 | NaN | 4.027783e+06 | NaN | 773461.552524 | NaN | 2.350756e+14 | NaN | 5.682195e+13 | NaN |
| 2008-05-29 | BAJAJFINSV | EQ | 640.95 | 656.65 | 703.00 | 608.0 | 634.5 | 632.40 | 659.60 | 4522302 | ... | 59.042386 | NaN | 4.486735e+06 | NaN | 123703.660710 | NaN | 2.769607e+14 | NaN | 2.541759e+13 | NaN |
| 2008-05-30 | BAJAJFINSV | EQ | 632.40 | 642.40 | 668.00 | 588.3 | 647.0 | 644.00 | 636.41 | 3057669 | ... | 22.040039 | NaN | 4.056243e+06 | NaN | 865428.886510 | NaN | 2.588793e+14 | NaN | 5.614629e+13 | NaN |
5 rows Γ 29 columns
InΒ [21]:
# Check for missing value
dataframe.isnull().sum()
Out[21]:
Symbol 0 Series 0 Prev Close 0 Open 0 High 0 Low 0 Last 0 Close 0 VWAP 0 Volume 0 Turnover 0 Deliverable Volume 0 %Deliverble 0 Highwindow_mean_3 2 Highwindow_mean_7 6 Highwindow_std_3 2 Highwindow_std_7 6 Lowwindow_mean_3 2 Lowwindow_mean_7 6 Lowwindow_std_3 2 Lowwindow_std_7 6 Volumewindow_mean_3 2 Volumewindow_mean_7 6 Volumewindow_std_3 2 Volumewindow_std_7 6 Turnoverwindow_mean_3 2 Turnoverwindow_mean_7 6 Turnoverwindow_std_3 2 Turnoverwindow_std_7 6 dtype: int64
InΒ [22]:
dataframe.dropna(inplace=True)
dataframe.isnull().sum()
Out[22]:
Symbol 0 Series 0 Prev Close 0 Open 0 High 0 Low 0 Last 0 Close 0 VWAP 0 Volume 0 Turnover 0 Deliverable Volume 0 %Deliverble 0 Highwindow_mean_3 0 Highwindow_mean_7 0 Highwindow_std_3 0 Highwindow_std_7 0 Lowwindow_mean_3 0 Lowwindow_mean_7 0 Lowwindow_std_3 0 Lowwindow_std_7 0 Volumewindow_mean_3 0 Volumewindow_mean_7 0 Volumewindow_std_3 0 Volumewindow_std_7 0 Turnoverwindow_mean_3 0 Turnoverwindow_mean_7 0 Turnoverwindow_std_3 0 Turnoverwindow_std_7 0 dtype: int64
InΒ [23]:
ind_features = ['Highwindow_mean_3', 'Highwindow_mean_7',
'Highwindow_std_3', 'Highwindow_std_7', 'Lowwindow_mean_3',
'Lowwindow_mean_7', 'Lowwindow_std_3', 'Lowwindow_std_7',
'Volumewindow_mean_3', 'Volumewindow_mean_7', 'Volumewindow_std_3',
'Volumewindow_std_7', 'Turnoverwindow_mean_3', 'Turnoverwindow_mean_7',
'Turnoverwindow_std_3', 'Turnoverwindow_std_7']
InΒ [24]:
# Check for no of rows
# First 2400 for training, and the next 2400 for model testing
dataframe.shape
Out[24]:
(3195, 29)
InΒ [25]:
# Data training
training_data = dataframe[0:2400]
testing_data = dataframe[2400:]
7. ForecastingΒΆ
We forecast future stock prices using the fitted model.
InΒ [26]:
import warnings
warnings.filterwarnings('ignore')
InΒ [27]:
model = auto_arima(y=training_data['VWAP'], X=training_data[ind_features], trace=True)
Performing stepwise search to minimize aic ARIMA(2,0,2)(0,0,0)[0] intercept : AIC=21501.827, Time=3.42 sec ARIMA(0,0,0)(0,0,0)[0] intercept : AIC=22553.316, Time=1.53 sec ARIMA(1,0,0)(0,0,0)[0] intercept : AIC=21962.075, Time=1.62 sec ARIMA(0,0,1)(0,0,0)[0] intercept : AIC=21621.879, Time=2.75 sec ARIMA(0,0,0)(0,0,0)[0] : AIC=38832.057, Time=1.45 sec ARIMA(1,0,2)(0,0,0)[0] intercept : AIC=21603.205, Time=3.25 sec ARIMA(2,0,1)(0,0,0)[0] intercept : AIC=21569.177, Time=3.02 sec ARIMA(3,0,2)(0,0,0)[0] intercept : AIC=21504.408, Time=3.43 sec ARIMA(2,0,3)(0,0,0)[0] intercept : AIC=21496.743, Time=3.65 sec ARIMA(1,0,3)(0,0,0)[0] intercept : AIC=21507.703, Time=3.41 sec ARIMA(3,0,3)(0,0,0)[0] intercept : AIC=21498.531, Time=4.05 sec ARIMA(2,0,4)(0,0,0)[0] intercept : AIC=21493.517, Time=3.97 sec ARIMA(1,0,4)(0,0,0)[0] intercept : AIC=21504.634, Time=3.70 sec ARIMA(3,0,4)(0,0,0)[0] intercept : AIC=21484.775, Time=4.44 sec ARIMA(4,0,4)(0,0,0)[0] intercept : AIC=21489.653, Time=6.02 sec ARIMA(3,0,5)(0,0,0)[0] intercept : AIC=21490.528, Time=4.20 sec ARIMA(2,0,5)(0,0,0)[0] intercept : AIC=21488.259, Time=4.03 sec ARIMA(4,0,3)(0,0,0)[0] intercept : AIC=21488.441, Time=3.99 sec ARIMA(4,0,5)(0,0,0)[0] intercept : AIC=21494.072, Time=4.56 sec ARIMA(3,0,4)(0,0,0)[0] : AIC=21482.774, Time=3.92 sec ARIMA(2,0,4)(0,0,0)[0] : AIC=21491.514, Time=3.53 sec ARIMA(3,0,3)(0,0,0)[0] : AIC=21496.532, Time=2.52 sec ARIMA(4,0,4)(0,0,0)[0] : AIC=21487.653, Time=3.02 sec ARIMA(3,0,5)(0,0,0)[0] : AIC=21488.528, Time=2.87 sec ARIMA(2,0,3)(0,0,0)[0] : AIC=21494.744, Time=2.32 sec ARIMA(2,0,5)(0,0,0)[0] : AIC=21486.260, Time=2.92 sec ARIMA(4,0,3)(0,0,0)[0] : AIC=21486.442, Time=2.46 sec ARIMA(4,0,5)(0,0,0)[0] : AIC=21492.073, Time=3.04 sec Best model: ARIMA(3,0,4)(0,0,0)[0] Total fit time: 93.151 seconds
InΒ [28]:
# See the model
model
Out[28]:
ARIMA(3,0,4)(0,0,0)[0]In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
ARIMA(3,0,4)(0,0,0)[0]
8. Visualization of ForecastΒΆ
We plot the original time series along with forecasted values.
InΒ [29]:
# Model predict
forecast = model.predict(n_periods = len(testing_data), X = testing_data[ind_features])
forecast
Out[29]:
2400 5062.237053
2401 5067.151485
2402 5140.069665
2403 5181.652603
2404 5206.024391
...
3190 9986.682752
3191 10045.087341
3192 10286.942142
3193 10784.870960
3194 11150.087691
Length: 795, dtype: float64
InΒ [32]:
# Model testing
testing_data['Forecast_ARIMA'] = forecast.values
testing_data[['VWAP', 'Forecast_ARIMA']].head()
testing_data[['VWAP', 'Forecast_ARIMA']].plot(figsize=(15,5))
Out[32]:
<Axes: xlabel='Date'>
9. ConclusionΒΆ
This notebook demonstrates a simple workflow for time series forecasting of stock prices using ARIMA.